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1.  Introduction 

Consider  an  tndiuidual  risk  (#1).  characterized  by  an  unknown  risk 
■parameter ,  0^,  from  which  n^  i.i.d.  sample  observations . 

®l=(xlt  :  (t=1.2, . . .n^)}.  are  available;  the  problem  is  to  predict  a 
future  observation,  say  w,  =  x,  . ,  .  Given  the  model  density, 

i  l,nj+t 

ptx^lSj),  and  prior  density,  p(G ^ ) .  finding  the  forecast  density, 
p(Wj|®),  is  then  a  simple  exercise  in  Bayes*  law. 

For  a  variety  of  simple  likelihoods  and  priors  (Jewell  [1974] 
[1975a]).  the  forecast  mean  turns  out  to  be  a  linear  function  of  the  data: 

(1.1)  «{wj  |®j)  =  f^Sj)  =  (1  -  ZjJm  +  Zl(2xlt/ni)  . 

n 

\ 

where  the  mixing  coefficient, 

(12)  Zj  =  nJ/(n1  +  (e/d))  , 

is  called  the  credibility  factor,  and  the  three  required  marginal  moments 
are: 

(1.3)  m  =  «(xlt|01)  ;  e^f^JOj)  ;  d.t^x^lflj)  . 

The  credibility  formula,  f j (®j ) •  has  an  obvious  interpretation  as  a 
mixture  of  the  prior  mean  with  the  sample  mean,  according  to  a  learning 
curve,  Zj,  which  tends  towards  unity  with  increasing  sample  size  at  a  rate 

i 

governed  by  a  time  constant,  (e/d)  .  f^(S^)  is  a  robust  formula  in  the 

sense  that  it  is  also  the  best  linear  least-squares  fit  to  the  true 


j  ${Wj|®j)  for  arbitrary  pfx^jOj)  and  p(0j)  (Buhlmann  [1967])  . 
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In  many  applications,  the  number  of  samples  from  risk  #1  will  be 
small,  but  there  may  be  additional  information  available  from  related 
risks,  that  is,  (xit)  characterized  by  a  different  risk  parameter.  0i , 
but  by  the  same  form  of  model  density.  p(x|8)  (1=2,3, ... r) .  For  example, 
in  insurance  we  may  have  a  portfolio  of  risks  which,  a  priori,  are  similar 
in  nature,  as  measured  by  some  risk  classification  scheme.  Or,  in  health 
statistics  we  may  have  a  cohort  of  apparently  similar  lives,  with  the  same 
ages,  heritage,  diet,  etc.  Of  course,  if  these  risk  parameters  were 
independent,  with  the  same  density  p(0).  then  the  collateral  data  would 
have  no  predictive  value. 

However,  if  we  assume  that  . 0r)  are  exchangeable ,  we  are 

able  to  keep  the  assumed  similar  nature  of  the  related  risks  as  well  as  to 
introduce  correlation  between  the  risks  in  a  natural  way;  we  do  this  by 
adding  another  unknown  parameter.  <p  ,  which  characterizes  additional 
uncertainty  at  the  portfolio  or  cohort  level.  In  other  words,  the  prior 
p(0)  is  changed  to  a  conditional  prior  p(0|f>).  and  we  assume  a  hyperprior 
density.  p(<p) .  is  known,  from  which  we  can  calculate  the  exchangeable 
joint  prior'- 

0-4)  PC^.02 . er)  =  /  P(*)  • 

The  resulting  three-level  structure  is  called  a  hierarchical  model. 

Unfortunately,  the  number  of  cases  in  which  explicit  analytic  results 
can  be  obtained  from  hierarchical  models  is  quite  limited.  As  far  as  we 
know,  the  only  case  in  which  the  predictive  individual  mean  is  linear  in 
the  individual  and  collateral  data  is  in  the  normal -normal -normal  linear 
model  of  Lind ley  &  Smith  [1972]. 
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To  simplify  notation,  we  shall 

consider  the 

important  special  case 

model : 

»v 

(xitK’^ 

~  (xitK> 

~  No(pi:f)  ; 

(1.5) 

(Mj  l*>) 

~  No(<p;g) 

(i=1.2. ...r) 

• 

(I) 

~  No(m;h) 

(t=l ,2, . . . ) 

As  before,  we  predict  a  future  value  of  the  risk  #1,  but  now  using  the 
total  cohort  data.  3)  =  {x^t  ;  (i=l ,2, — r)( t=l ,2, . . ,n^ )) .  It  will  be 
apparent  later  that  the  predictive  mean  can  now  be  written  as  the 
combination  of  two  credibility-like  forecasts: 

1(^12)}  =  f  j(®)  =  (1  -  zx)  fo(3>)  +  zlYl  ; 

(1.6) 

f  (S)  =(l-z)m+zY 
ov  '  '  o'  o  o 

Here,  the  collateral  data  is  organized  into  r+1  linear  sufficient 
statistics'- 


(1.7)  yt  =  (2xit/ni)  ;  Yq  =  (Sz^/IZj)  : 

and  r+1  credibility  factors: 


n. 


(1.8) 


zi  "  n.  +  (f/g) 


z  = 
o 


2Zj  +  (g/h)  • 


for  each  risk  (1=1,2 . r)  and  for  the  portfolio  as  a  whole  .  fo(2>) 

turns  out  to  be  the  predictive  mean  at  the  portfolio  level,  £{<#>|2>}; 


additional  interpretations  will  be  given  below  in  Section  9. 

(1.6)  is  also  the  credibility  formula  (best  linear  predictive  mean  in  the 
least-squares  sense)  for  hierarchical  models  with  arbitrary  p(x|0), 
p(0|<p),  and  p (if)  (Jewell  [1975b]). 

Thus,  it  would  seem  that  the  normal -normal -normal  hierarchical  model 
would  be  satisfactory  for  most  situations  in  which  collateral  data  is 
available,  since,  as  we  shall  see  below,  the  full-distributional 
predictive  results  are  also  easily  obtained.  Recently,  however,  the 

ti 

author  and  Hans  Buhlmann  have  been  examining  credibility  approximations 

for  the  second  moments  of  (w^l®).  in  which  various  second-order  statistics 

from  the  portfolio  are  used  to  augment  a  second-moment  forecast  using 

individual  data  (Jewell  &  Schnieper  [1985],  Jewell  [1987],  Jewell  & 

•• 

Buhlmann  [1987]).  The  least-squares  analysis,  while  messy,  is 
straightforward.  However,  (1.5)  turns  out  not  to  be  a  useful  test  case 
for  these  approximate  formulae,  since  all  of  the  posterior  second  moments 
are  homoscedastic  in  the  data,  by  which  we  mean  that  they  depend  only  upon 
the  r+1  sampling  design  parameters  (n^.r),  and  not  upon  the  actual  data 
values.  For  instance,  for  all  data  3>- 

(1.9)  *{^1®}  =  g(l  -  =  *{I|®}  =  Ml  -  zo)  ; 

and  similarly  for  the  predictive  variance  of  the  observables  (see 
Section  4). 

This  is,  we  believe,  a  serious  limitation  of  the  normal-normal-normal 
model,  since  we  would  expect  in  a  more  general  setting  to  be  able  to  learn 
about  any  unknown  variances  from  sufficient  portfolio  data.  In  what 
follows,  we  present  our  attempts  to  generalize  (1.5)  in  order  to  obtain 


heteroscedastic  forecast  formulae,  while  retaining  the  simplicity  of  the 
credible  mean  forecast  (1.6).  The  resulting  model  turns  out  to  be  a 
generalization  of  the  classical  unknown  mean  and  precision  normal  model 
for  the  non-hierarchical  case.  The  formulation  also  clarifies  the 
numerical  integration  that  would  be  necessary  to  extend  the  generalization 
further . 


2.  A  Generalized  Hierarchical  Model 

To  obtain  heteroscedastical  results  from  the  normal  hierarchical 
model,  we  must  permit  the  variances  at  each  level  to  be  random  quantities. 
Specif ically.  for  risk  i  at  time  t,  we  assume: 


(2.1) 


»v  /v  _ j 

(xit  l^i  .<P.u,*r  ,tj)  ~  (xit|Mi,w)  ^  No(p.;u  )  ; 

(Ui  k.Tf.Tj)  ^  (pj^.-r)  ~  No(<p;n~  )  ; 

(v> |t7)  ~  *  ~  No(m;rj  *)  . 


(1=1 .2 . r) 

(t=l,2 . ) 


~  ^  ~  — 1  *■“  1 

where  u.  -r,  and  rj  are  new  random  precisions,  corresponding  to  f  g 

and  h  respectively.  As  before,  the  variable  length  observed  data. 

S=  {xjt)  .  is  to  be  used  to  estimate  all  of  the  unknown  parameters,  now 


denoted  by  ®  =  (p,  <p;  w,  t.  tj)  ,  plus  make  predictions  of  future  x^.  p 

r\j  A/ 

and  are  the  conditional  mean  parameters,  as  before.  We  let  fi  =  (w.  t,  q) 


denote  the  group  of  unknown  precision  parameters;  we  assume  that  they  are 
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statistically  independent  of  the  r+1  mean  parameters.  Temporarily,  we 
assume  that  the  precisions  are  governed  by  some  known  prior  joint  density, 
p(fl). 

2 

Using  the  statistics  y^  in  (1.7),  and  setting  y^  =  (Ix^/Tu),  we 
find  the  joint  density  of  the  data  and  the  mean  parameters,  conditional  on 
the  precisions,  as: 

^i72  r/2  1/2 

*(*•■*•♦  1°)  =  (£->  * 


(2.2)  exp  {-<o  2  [n.yil  -  ^.n^  +  n^^J/2)  x 


exp  { — r  2  [p  -  <p]2/2  -  T)[^>-m]2/2} . 
i 


We  then  extract  the  conditional  densities  of  the  mean  parameters  in  the 
usual  tedious  way. 

At  the  individual  risk  level,  we  find  that  the  conditional  Joint 
density  of  the  mean  parameters,  p(ji  |<p,Q,D) ,  is  a  product  of  independent 
normal  densities,  with  means  equal  to: 

(2.3)  ^{Pjl^.n.S)  =  f^yj.n.S)  =  [I-z^d.-t)]  y>  +  zi(a),-r)y.  . 

and  variances: 


(2.4)  |<p,Q,S)  =  (-r+unj)-1  =  jj-  [1  -  z^u.-r)]. 


for  i=l,2,...,r.  In  parallel  with  the  simpler  model,  we  have  defined 
( conditional )  Indluldnal  credibilities: 
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un . 

(2-5)  z.(,.R)  =  *,(«.*)  =  (  ^  ) 

and  ( conditional )  individual  credibility  forecasts  f^(v>,n,2),  for  all 
risks.  Note  that  neither  the  z^  nor  the  variances  depend  upon  the  data  2 
nor  the  highest- level  precision  parameter  rj.  Thus,  we  are  back  to  the 
homoscedastic  case,  as  expected. 

Removing  u  from  (2.2),  and  again  completing  the  square,  we  find  that 
the  conditional  density  of  the  cohort  (portfolio)  mean.  p(<#>|n,2)),  is  also 
normal,  with  a  (conditional)  cohort  credibility  forecast: 

(2.6)  In.ffi)  =  fo(n.s)  =  [i-zo(n)]  m  +  zQ(n)  Yo(n). 


where,  as  before,  we  define  a  (conditional)  cohort  credibility  factor: 


(2.7) 


t  2  Zj(w,ir) 

Z0^  TJ  +  7  2  Zj(w.-t) 


and  the  (conditional)  credibility-weighted  average  obseruatton: 


(2.8) 


Y 

o 


(fi)  =  Yo(w.-r) 


2  zi(w.Tr) 

2  Zj(w.t) 


(We  suppress  the  obvious  dependence  of  Yq  upon  S.)  Note  that  zq  does  not 
depend  upon  the  data,  and  depends  upon  u  only  through  the  (z^).  The 
variance  of  the  conditional  cohort  mean  parameter  is: 


(2.9)  f{J|n.®}  =  f(i|n>  =  (17  +  -T  2z j (w . ">:) )_1  =  i  [l-zo(0)]. 
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iV 

Jm' 

lv! 

I\* 


which  is  also  homoscedastic,  and  depends  upon  w  only  through  the  (z^). 

The  remainder  of  (2.2)  can  now  be  matched  with  the  still-general 
prior  p(fl)  to  give  the  conditional  posterior  joint  density  of  the 
precision  parameters  as: 


p(n|S)  a  p(fl)  </V2  {17[l-2i(u,(-r)]1/2}{[l-zo{n)]1/2}  x 


I 

A* 


(2.10) 


where  the  function  A  is: 


exp{-A(Q.2>)/2}  . 


i 


(2.11a)  A(il.2))  =  ul[niyii  -  n.z.fu.-Tjy*]  +  T){ m"1  -[l-ZQ(n)]  1  f2(fl,S)} 


•• 


ft 

1 


1 t 


By  expansion,  and  the  use  of  the  new  definitions: 


(2.12) 


N  =  2  n,  ;  z(w,-r)  =  N  Sn^z^u.-y)  ; 


Z(w.-r)  =  2z.(u.7)  ;  Yqo  =  N  1Iniyii  ; 


$ 

$ 

j'Ji 

’*;!» 


£ 

m 


the  exponent  can  be  put  into  several  equivalent  forms,  such  as: 


(2,11b)  A(n,S)  =  w[NYoq-  InjZ^u.nrJy2]  +  77Zo(n)[m-Yo(u,Tr)]' 


-  'rZ(w1Tf)  Y‘(u.-r)  . 


v 

'<£ 
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$ 


'An 


VVJ 

•  V 


!?S 

•w 


:w: 


■■a 

Si 


!yS 

SS 

»;•!« 


j? 


& 


% 


$| 

F 

:w 


(2.11c)  A(0.2)  =  N b  ]  [Yoo-Y^.,)]  +  N'12nizi(W.'r)[Y2(W.'r)-y2] 


+  [l-z(a..'r)][l-zo(fl)][m-Vo(u.'r)r 


We  apologize  for  this  heavy  notation,  but  it  is  important  for  the  special 


models  to  follow  so  that  we  know  explicitly  where  the  various  unknown 


precisions  occur.  In  any  case,  it  should  be  obvious  from  (2.10)  and  any 


of  the  (2.11)  formulas  that  there  is  no  "magic”  prior  p(fi)  that  will  give 
a  tractable,  closed-form  posterior  density.  p(fl|2)  ! 


3.  Partially  Unconditional  Parameter  Posteriors  and  Forecasts 


Before  progressing  to  various  special  cases,  we  first  find  several 


useful  "marginal"  densities,  conditional  only  upon  fl  and  2 


Noting  that  the  exponent  of  (2.2)  is  quadratic  in  <#>,  and  that 


p(ttl«f.n.a)  is  normal,  we  deduce  that  p(n|fl,2)  must  be  multinormal,  with 


moments  defined  as.  say: 


(3.1) 


£{u|fi.2)  =  f(n.»)  ;  f{ii|n.2}  =  2(0.2) 


Unconditioning  (2. 3) (2. 4)  using  (2. 6) (2. 9).  we  find,  for  the  components  of 


the  mean  vector: 


(3.2)  [f(n.2)]i  =  fjfn.2)  =  [l-Zi(«o.T)]  fo(0.2)  +  z^u.-r)  yt  ; 
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and,  for  the  components  of  the  covariance  matrix: 


(3.3)  [2(n,S)]i  J  =  a^cn.s)  = 


nr  ^l-z.fw.-r)]  +  r\  1[l-z.(<J,Tr)]2[l-Zo(n)]  (i=j) 


-1 


TJ  [l-z1((J.-r)][l-zj(G).Tr)][l-zo(n)) 


(i/j) 


with  i.j  =  1.2.... r.  (Henceforth,  we  shall  omit  obvious  ranges  on 
indices.)  Again,  we  see  that  even  these  partially  unconditioned 
covariances  are  still  homoscedastic . 

It  is  also  important  to  be  able  to  find  the  predictive  densities  for 

A/ 

future  values  of  the  individual  risk  variables.  Let  w.  =  x,  for  onu 

it  it  3 

A/  A/  4 

t>nj .  Since  £{w1(.|@}  =  and  ^{w^l®}  =  w  .  for  all  i  and  all  future  t. 
it  follows  from  the  above  that  p(w|fi.®)  is  also  multinormal,  with  means: 


(3.4) 


«{wit|n.®}  =  fi(n.») 


and  covariances: 


A 

(J 

+  oii(n,$) 

(i=j) 

(t=u) 

(3.5)  <€{wit;wju|n.S}  = 

(i=j) 

(t^u) 

(i*j) 

for  all  t>nj  and  u>nj . 
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When  considering  the  entire  cohort,  it  may  be  useful  to  predict  the 

*v 

total  cohort  rish  sum.  Iw  .  or.  more  usually,  the  cohort -average  future 

i  1 

“1  ^ 

rtsh,  w  =  r  2w.  (for  any  future  time  t>max(n.)).  Defining  a 

O  ^  1  L  X 

cohort-average  credibility  factor. 


(3.6)  zc(D)  =  1  -  [l-zo(n)][l-r‘12z.((J.'r)]  =  [1+  rj(ry)  *]  zJD)  . 


we  combine  (3.2)  and  (2.6)  to  find: 


(3.7)  *{wo|n.S)  =  fc(n.»)  =  [l-zc(D)]  m  +  zjf?)  YJu.y)  . 


and: 


(3.8)  t{wo|n.®}  =  (rw) 


_1  +  r“2  220^(0,®) 


Note  that  forecasting  wq  is  not  the  same  as  estimating  <p,  since  the  former 

is  from  a  finite,  possibly  biased,  sample  of  individual  risks  that  is 

fixed  once  and  for  all.  The  reader  may  easily  find  the  covariance  between 

successive  future  values  of  w  . 

o 


4.  The  Constant  Variance  Model 

As  our  first  use  of  the  above  formulae,  we  note  the  obvious  fact  that 
setting: 
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(4.1) 


-1 


U) 


=  f 


-1 


=  g 


-1 


=  h 


will  give  the  full-distribution  results  for  the  simple  hierarchical  model 
described  in  Section  1.  We  get  the  simplification  Yq  =  y^ ,  and  the 
various  credibility  factors  become  simply: 


n. 

i 


Iz, 


(4.2)  z.  =  ^ 

v  '  l  n  +n. 

o  1 


Zo  r  +Iz 


-It 

z  =  r  2.2  . 

l 


z  =  [1+  — ]  z 
c  r  o 


where  nQ  =  f/g  and  rQ  =  g/h  .  p(u|2).  p(«#>  |2>) .  p(w|®).  and  p(wq|2)  are 

multinormal,  with  means  given  by  (2.3)(2.6)  and  (3.7),  respectively. 
Interpretation  of  these  results  will  be  given  below  in  Section  9. 

As  mentioned  previously,  all  of  the  variances  and  covariances  of 
these  rv*s  are  homoscedastic,  in  the  sense  that  they  depend  upon  the 
sampling  design  parameters  (n^r),  but  not  upon  the  actual  data  values  in 
2L  For  instance.  (3.3)  simplifies  to= 


(4.3)  <€{p1;pj|  3)=^^®)=  j 


g(l-Zi)  +  h(l-Zini-zo)  (i=j) 

h(l-zi)(l-Zj)(l-zo)  (i/j) 


5.  A  Special  Non-Hlerarchical  Model 

For  our  second  special  case,  let  us  assume  that  knowledge  about  <#>  is 

IV  /V/ 

’’tight",  by  letting  tj  — ►  °>  .  This  makes  <#>  — ►  m,  almost  surely,  and 
effectively  removes  the  hierarchical  nature  of  the  model.  Thus, 


Hj  ~  JVo(m;Tr  *),  and  the  unknown  parameters  reduce  to  %  =  (p;u.y).  Note 
also  that,  in  the  limit,  zq — ►  0.  but  that  (t7zq)  — *  t  Iz^(v.y). 
Furthermore,  the  z^(u.y)  still  depend  upon  both  unknown  precisions,  so 
that  the  conditional  joint  density  (2.10)  is  still  rather  complicated. 

However,  by  examining  the  individual  credibility  factors  (2.5),  we 
see  that  they,  in  fact,  depend  only  upon  the  ratio  y/u  of  the  remaining 


two  unknown  precisions!  Therefore,  if  we  additionally  assume  that  to,  say, 

has  a  prior  density  p(<j) .  and  that  the  ratio  is  fixed  at  some  positive 

value  n  : 
o 


(5.1) 


y  =  n  u 
o 


we  obtain  a  great  simplification,  in  that  the  z .  =  n./(n  +n4)  are  now 
independent  of  the  precision  parameters,  as  are  the  forecasts  fi(2>)  in 
(3.2)  (with  fQ=m) .  and  the  sufficient  statistic  Yq  in  (2.6)  and  (3.7). 
Thus.  p(jj|u,S)  is  multinormal  with  means  f ^ (25)  and  simplified  covariances: 


(5.2)  |w.S)  =  oii(u,S>) 


(wn0)”1(1“Zi) 


(i=j) 

» 

(i*j) 


In  other  words,  the  (p^  |u,2D)  are  conditionally  independent.  Similar 

results  hold  for  the  moments  of  the  (wjt)  and  for  wq,  where  now 
-1- 

z  =  r  2z.  . 
c  i 

More  Importantly,  (2.10)  now  simplifies  to: 


(5.3) 


p(w|S)  p(u)  exp(-A(u,S)/2) 


with  the  exponent: 
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(5.4)  A(u.S)  =  wN  < 


[Y  -  Y2] 

OO  A 


♦  (l-z)(l-zo)(m-Yo)2  ♦  N  1^nizi (Y2-y2) 


=  t)N  B(S>)  .  say, 


now  a  linear  function  of  <j!  Thus,  unconditioning  w.r.t.  u  can  be  carried 
out  exactly,  if  we  additionally  assume  that  the  natural  conjugate  gamma 
prior  is  used,  that  is: 


(5.5) 


p(u) 


A>a_1  e’^ 

r(a) 


which  we  write  as  u  ~  Ga(a.P). 

It  then  follows  that  the  poster! or- to-data  density  is  closed  under 
sampling ,  and  (u|3))  ~  Ga(a’ ,P’ ) ,  with  updated  parameters: 


(5.6)  a’  =  a  +  N/2  ;  p’  =  p  +  NB(2)/2  . 

In  this  way,  the  completely  unconditional  densities  p(y|®),  p(w|$),  and 
p(wol®)  can  be  obtained  as  Student-t  densities,  and  variances  computed 
with  the  help  of  * |$}  =  P'/(a'  -  1). 

We  say  that  u  has  the  i — dimensional  multluarlate  Student-t  density 
with  v  degrees  of  freedom,  mean  vector  m,  and  precision  matrix  written 
u  ~  Str(i»;m;'J')  ,  if: 
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with,  of  course.  £{u)  =  m,  and  f{u)  =  *  *  .  (r=l  is  the  ordinary 
Student-t  density.)  The  proof  that  p(tt|ffi)  has  this  form  follows  from  the 
fact  that  G)  does  not  enter  into  the  f^®);  it  also  follows  that  the  (p.  |®) 
are  still  independent  but  with  density  St  j(u;m.  .  where: 

(5.8)  E{p.  |®}  =  mi  =  fj(S)  . 

(5.9)  t{^|2}  =  =  (no+ni)'1«{3_1|2)}  . 

A  separate  analysis  shows  there  are  v  =  2 a‘  =  2a+N  degrees  of  freedom. 

Similarly.  (wlt|®)  ~  StJfSia+N)  ;  f^®)  ;  [Xn+^f1]  «{u-1|®}]. 

for  all  t>n. . 

i 

To  see  these  results  in  a  more  familiar  context,  consider  the  case  of  one 
risk,  in  which: 


(5.10)  Yo  =  Yl  ;  Y00«yn  :  Z  =  z  =  Zj  ;  N  =  ^  . 


and  the  exponent  factor  in  (5.4)  becomes  simply: 


(5.11) 


B(»)  =  (yn  -  y^)  +  (l~z1)(m  -  yj)2 


Then,  the  posterior-to-data  variance  of  any  future  value  of  x^t  can  be 
predicted  from  the  credibility  formula: 


(5.12)  f{wltj®}  = 


n  +n,+l 
o  1 


2a~2+n 


(rrf) ( ‘"zi )  vzi  < ‘-zi  >  <”-yi  >2*zi  (yiryi  > 


~  _ 1  ~_1 

where  v  =  Y{w,  }  =  (1+n  )  4{(j  }.  If  we  take  the  "natural”  choice  of 

o  it  o 

a  =  (nQ+3)/2  (see  remarks  in  Jewell  &  Schnieper  [1985]),  we  obtain  the 
well-known  natural-conjugate  results  for  the  one-dimensional  normal  with 
unknown  mean  and  variance.  In  other  words,  the  results  of  this  section  can 
be  viewed  as  the  generalization  of  this  simple  individual  risk  model  when 
collateral  data  is  available. 


6.  A  General  Hierarchical  Model  with  All  Variances  Linked 

We  are  now  ready  to  tackle  a  general  heteroscedastical  hierarchical 
model.  The  success  of  the  non-hierarch leal  model  of  the  last  section  was 
due  primarily  to  assumption  (5.1).  which  removes  the  dependency  of  the  z ^ 
upon  any  random  quantity.  This  suggests  that  an  appropriate  assumption  in 
the  hierarchical  case  would  be  to  link  all  three  precisions  together,  by 
assuming: 

(6.1)  nr-nu  ;  T)  =  r  i  =  r  n  u, 

o  o  o  o 

for  appropriate  positive  n  and  r  .  This  is  equivalent  to  saying  that  . 

o  o 

have  very  tight  knowledge  about  how  the  total  variance  is  split  up  among 
the  three  levels  of  the  model,  but  the  value  of  the  total  variance  is 
unknown.  This  assumption  not  only  simplifies  the  various  credibility 


factors'- 


but  also  makes  the  ( f  ) .  f  ,  Y  .  and  f  independent  of  u!  Thus,  the 

loo  c 

unconditioning  on  <*>  required  in  the  general  formulae  of  Sections  2  and  3 
reduces  to  simple  expectations  over  (u|®).  with  the  possibility  of  using 
the  natural  conjugate  Gamma  prior  of  (5.5)!  For  completeness,  we  now 
display  all  of  the  final  results,  using  notation  and  arguments  developed 
previously. 

6.1  Posterior  Parameter  Results 

If  p(u)  is  Ga(a.P).  then  from  (2.11c)  we  see  that  p(u|2>)  is 
G a(a’,/3'),  with  updated  parameters: 


(6.3) 


a’  =  a  +  N/2  ;  p'  =  p  +  NB(®)/2  ; 

B<®>=  P'co-'fr  *  <i-x)(i-xo)t»-Yo32  * 


similar  to  (5.4).  The  most  important  moment  formula  for  our  purposes  is 
1 

i{ w  |®}  =  a'/(P’  -  1),  which  we  note  can  be  written: 

(6.4)  =  (1  -  zj  E {Z'1}  +  zM  B (®). 


where  we  have  eliminated  P  in  favor  of  the  prior  mean  variance  and  have 
defined  a  uartance  credibility  factor- 


(6.5)  zm  =  N/(N  +  2(a-l ) )  . 

From  (2.6)  and  (2.9),  we  find  the  posterior-to-data  moments  of  the 
portfolio  mean  to  be: 
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(6.6)  «{I |2>}  =  fQ(S)  =  (1  -  zq)  m  +  zq  Yq 


(6.7) 


1  -  z 


r  n 
o  o 


S  {u 


-1 


In  fact,  because  of  the  position  of  u  in  the  simplified  version  of 
p(<p|fl,$),  we  see  that  p(<f|®)  must  be  a  one-dimensional  Student-t  density, 
with  2 a'  =  2a  +  N  degrees  of  freedom,  and  the  above  mean  and  variance. 

Progressing  to  the  individual  risk  means,  we  use  the  results  of 
Section  3  to  find  the  individual  mean  forecasts  as  before : 


(6.8)  idljl®}  =  f^S)  =  (1  -  Zj)  fo(»)  +  Zi  yt 


and  the  new  covariance  structure: 


(6.9)  <€{M.:pj|2)}  =  a^fS)  = 

«{w-1 |2)}  ‘ 

where  6^  is  the  Kronecker  delta-function,  and  the  data  values  enter 
through  the  use  of  (6.3)-(6.5).  Again,  because  of  the  normal-gamma 
distributional  assumption,  one  can  argue  that  p(y|S)  is  a  r-dimensional 
Student-t  density,  with  2a  +  N  degrees  of  freedom. 

6.2  Posterior  Predictive  Results 

Passing  to  the  forecasts  of  future  values,  (3.4)  (3.5)  give  us 
simply: 


0-2,)  .  *  (i-yo-yo-*,) 


n 


ij 


r  n 
o  o 
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(6.10) 


(6.11) 


*(witlD}  =  fi(D)  =  (1  -  z4)  m  +  zA  y4  , 
=  6ij  6jU  £{U"1|25}  + 


From  these,  various  multivariate  Student-t  densities  can  be  generated 
"over  future  times"  for  a  single  risk,  "over  risks"  at  one  future  epoch, 
or  for  various  mixtures  of  risks  and  epochs. 

Finally,  the  average  future  risk  for  the  entire  cohort  (3.7)(3.8)  has 
moments : 


(6.13)  t{wQ|®}  =  \ 


* 

I  [ 

1  +  -  (1 
n  v 

-  h] 

1-z 

o 

- 1]2} 

r 

r' J 

▼ 

r  n 

• 

o 

L  o  oj 

It  follows  also  from  arguments  similar  to  those  above  that  p(wq|®)  will  be 
a  Student-t  density. 

Any  of  the  final  (co)variance  formulae  can  be  put  into  a 
credibility-like  form.  For  example,  the  formula  corresponding  to  (5.12) 
for  the  total  variance  of  a  future  observation  of  risk  ill  can  be  written 
as: 


r  n  +(l-z.)r  +(l-z  )(l-z,)‘ 
o  o  v  1'  o  v  o,v  1' 


r  n  +r  +1 
o  o  o 


r  n  +r  +1 . .  _  _  _ 

00  °  "(*„<>  *  (i-*)(i-z0H»-ir)  ♦ 


rr1  -  yf) 


7.  Similar  Data  Lengths 

Notation  simplifies  dramatically  if  all  data  record  lengths  are  the 
same,  i.e.,  n^  =  n  ( i =1 .2 . r).  Then: 


(7.1)  N=nr  ;  Zj=z=z=n/(no+n)  ;  Z=rz  ;  zQ=rz/(ro+rz)  ;  z^=nr/(nr+2(a-l ) ) , 


and  the  overall  statistics  Yq  and  Yqq  can  be  replaced  by  the  simpler 
versions : 


(7.2)  yQ  =  ly./r  =  22xit./nr  ;  yQo  =  Syj./r  =22xit  /nr 


Then  (6.3)  simplifies  to: 


(7.3)  B(2>)  =  yQo  -  (1-z)  y2  -  z(2yj/r)  +  (l-z)(l-zo)(m-yo)2  . 


For  approximations  and  asymptotic  studies,  it  is  useful  to  eliminate  y* 
2 

and  2y^/r  in  favor  of  the  statistics: 


(7.4)  Vo  *  Iy!;i/r  ;  Vi  '  Xit 


*  P-»] 


2  2  yiyj 
i<J  J 


from  which  one  can  show  that: 


2  lrl  _  2,  ,r-K  1_  2  1  ^  /n~li 

(7.5)  y_  =  Iyt )  +  (— . )  y^  ;  =  r  y_  +  (— )  y„.„ 
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B(2>)  and  the  variances  above  can  be  manipulated  into  various  useful  forms. 
For  instance.  B(2J)  can  be  expressed  solely  in  terms  of  the  differences 
(y  ~y  )  and  (y  -y  ).  the  various  credibility  factors,  and  the  design 
parameters .  n  and  r. 


8.  Single  Unknown  Variance 

To  illustrate  the  difficulties  caused  by  more  general  precision 
structures,  we  consider  the  model  analyzed  by  Berger  [1985,  4.6],  in  which 
it  is  assumed  (in  our  notation)  that  o  =  f  *  and  77  =  h  *  are  known  with 

*v  »v  <v 

certainty,  so  that  the  unknown  paramters  are  (u.'P.i).  For  simplicity,  we 
assume  all  data  lengths  are  equal  to  n  (Berger  has  n  =  1). 

Thus  there  are  only  two  conditional  credibility  factors  to  consider: 

(8  1J  Zl(_r)  =  fW+n  :  Zo^  =  (rnh+f  )nr+n  : 

plus  the  Individual  statistics  (yj*yjj)  and  the  simplified  portfolio 
statistics  (y  ,y  )•  Other  factors  in  (2.12)  are  simply:  N  =  nr. 
z(-r)  =  z^t).  and  Z(-r)  =  rz^Tr). 

The  posterior  precision  density  (2.10)  now  becomes  a  one-dimensional 
formula  that  can  be  "simplified"  to  reveal  the  structure  on  t: 


(8.2)  PM»)  P(t)  (?£h]  [(7^I?)^n]  exp  (K:(n,)/2)  : 


-1, 


(8.3)  C(-t)  =  (rnf  *) 


(t™)  [(yo  -  Iyi/r)  *  [(rrJlF.n] 
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It  is  clear  that  no  analytical  prior  will  lead  to  a  tractable  posterior 
density,  so  that  numerical  methods  will  have  to  be  used  in  this  case. 
However,  examination  of  the  results  of  Section  3  shows  that  the  essential 
work  in  finding  the  various  posterior  and  predictive  first  and  second 
moments  will  be  in  taking  expectations  of  various  powers  of  the  two 
credibility  factors,  plus  finding  ^|25).  The  situation  is  similar  if 
either  u  or  r)  is  an  unknown  variance,  except  that,  in  the  latter  case, 
only  zq  will  be  modified  by  the  data. 


9.  Interpretation  of  Results 

As  previously  mentioned,  the  various  credibility  results  (the 
formulae  with  notation  f(S))  show  that  predictive  means  can  be  represented 
as  linear  convex  combinations  of  a  prior  mean  with  a  classical  estimator 
for  that  mean.  With  special  priors,  this  may  even  be  true  for  an  updated 
variance  (6.4).  As  the  number  of  data  points  increases,  the  credibility 
factor  increases,  so  that  we  place  more  "credibility"  in  the  classical 
estimator,  until  (in  the  simplest  cases),  the  Bayes  estimator  is 
essentially  the  sampl ing-theory  estimator.  The  rate  at  which  this 
"learning"  occurs  depends  upon  a  credibility  "time  constant"  that 
describes  how  our  prior  uncertainty  is  split  between  observational 
variation  and  prior  uncertainty  in  the  mean  parameter. 

In  the  hierarchical  model  with  known  variances,  (3.2)  and  (2.6) 
reveal  that  the  individual  mean  forecasts,  f^($),  are  linear  combinations 
between  the  classical  estimators,  y^ ,  and  another  credibility 


forecast  for  the  portfolio-level  mean.  <#>.  fQ(®)  itself  mixes  a 

"universal"  mean,  m,  with  a  portfolio-wide  statistic.  Y  ,  according  to  a 

cohort  learning  curve.  z  .  with  a  time  constant  that  is  the  ratio  of 

o 

higher  level  variances.  Note  that  the  individual  data  (xit)  from  risk  i 

all  has  the  same  unity  weight  over  time  in  calculating  the 

individual-level  statistic,  ,  but  that  these  then  have  differing 

weights,  z.  ,  over  risks  in  calculating  the  portfolio  statistic,  Y  . 

1  o 

Further,  the  z ^  increase  with  increasing  numbers  of  samples,  .  but  zq 
increases  only  with  increasing  sum  2Zj  over  the  number  of  risks,  r!  It 
can  be  shown  that  this  appealing  layer-by-layer  compounding  of  credibility 
forecasts  that  depend  only  upon  local  sources  of  variation  and  lower-level 
credibility  factors  repeats  itself  in  more  general  hierarchical  models 


with  many  layers  (Buhlmann  8t  Jewell  [1987]).  An  important  practical 

remark  Is  that,  while  the  z i  approach  unity  with  increasing  n  ,  this  is 

not  true  at  the  portfolio  level,  since  z  remains  less  than  unity  for 

o 

finite  r.  In  other  words,  because  our  finite  portfolio  may  represent  a 


biased  selection  of  possible  (0j),  Yq  is  not  "fully  credible"  for  <#>,  even 

with  very  large  numbers  of  data  points  per  risk. 

In  fact,  it  was  examination  of  the  credibility  factors  that  revealed 

the  essential  simplification  in  the  generalized  hierarchical  model  (2.2). 

In  order  to  keep  the  intuitive  and  appealing  linear  formulae  for  the 

predictive  means,  we  must  constrain  the  ratios  of  adjacent-level 

precisions  so  that  the  credibility  time  constants,  n  and  r  ,  will  remain 

o  o 


constant,  so  that  only  one  precision  Is  unknown!  This  has  the  very  useful 


byproduct  of  making  the  exponent  A  in  (2.10)(2.11)  linear  in  that 

precision,  so  that  the  Gamma  is  a  natural-conjugate  prior  density.  The 

notation  nQ  and  r^  suggests  that  the  prior  parameter  densities  behave  as 

if  prior  information  were  equivalent  to  r  risks  each  having  n  data 

o  o 

samples  with  common  value,  m.  Similar  constraints  are  necessary  in  more 
complicated  hierarchical  models  to  keep  the  predictive  means  in 
credibility  form;  full  details  for  the  Lindley  &  Smith  [1974]  linear  model 
will  be  in  a  forthcoming  paper. 

There  are  many  other  combinations  of  model  and  prior  densities  that 
have  simple  credibility  forecasts  using  individual  data.  However,  the 
author  has  been  unable  to  extend  these  models  to  the  hierarchical  case  and 
still  retain  the  linear  structure  for  a  portfolio-level  forecast  using 
collateral  data.  I  would  be  interested  in  hearing  from  others  who  have 
looked  further  into  special  structures  for  first-  and  second-moment 
forecasts  in  hierarchical  models. 
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